Analysis module 1: Run statistics, filtering, controls and 10% validation RoVI study — 05 November, 2020

Sidebar

Run statistics

Filtering

Run-to-run variation

10% validation

Unsupervised clustering

Sample availability

Run statistics

Basic statistics

Number of runs: 17
Number of samples: 4317
Number of controls: 396

Sample type by country (includes replicates)

     
      India Malawi  UK
  BM1   334    133  92
  BM2   301     97  42
  BM3   298     89  41
  BS1   354    130 104
  BS2   316     86  64
  BS3   318     79  58
  BS4     0     95  60
  BS5   336    104  88
  BS6     0     81  61
  MS1   337    121  98

Input read count by sequencing run

% read retention by sequencing run

% read retention by sample type

Minimum, median, and maximum output counts by sequencing run

           run         n       min       med       max
1    LIMS12416      53.0      85.0  111203.0 7127326.0
2    LIMS12651     170.0     316.0  177221.5 3919893.0
3  LIMS14462l1     370.0       0.0   86285.5  756928.0
4  LIMS14462l2     364.0       0.0   84133.0 1839299.0
5    LIMS14801      95.0       0.0   73892.0  333405.0
6    LIMS15089     370.0       3.0  110796.5 3748639.0
7    LIMS15168      89.0      32.0  513390.0 1060421.0
8    LIMS15350     323.0       0.0  156639.0  661343.0
9    LIMS15914     348.0       0.0  122788.5  534346.0
10   LIMS15990     356.0       0.0  127734.0  422886.0
11   LIMS16518     357.0       0.0   48510.0 1534696.0
12   LIMS16519     354.0       0.0   87388.5 1945706.0
13   LIMS17407     343.0       0.0  126279.0  921853.0
14   LIMS18668     370.0       0.0  119663.0 1057161.0
15   LIMS18669     367.0       0.0  113045.0  632275.0
16        p1p2     192.0       0.0   58134.0   89725.0
17        p3p4     192.0       0.0   57241.5   90699.0

Minimum, median, and maximum output counts by sample type

    sample group         n       min       med       max
1     breastmilk    1427.0       0.0   88021.0 3748639.0
2           ctrl     396.0       0.0   66137.0 7127326.0
3   infant stool    2334.0       0.0  119591.0 3919893.0
4 maternal stool     556.0      78.0  102317.5 1569457.0

Filtering

Step 1: load unfiltered feature table - ps1

  • n samples: 4313
  • n controls: 367
  • n features: 81278

Step 2: filter by length (390-440bp) - ps2

  • n samples: 4313
  • n controls: 367
  • n features: 69094

Step 3: filter non-bacterial sequences - ps3

  • n samples: 4313
  • n controls: 366
  • n features: 61147

Step 4: filter taxa not present with abundance of ≥0.1% in at least 2 samples - ps4

  • n samples: 4313
  • n controls: 360
  • n features: 9938

Step 5: filter contaminants - ps5

Read counts in extraction controls (WC) and negative-template PCR controls - stool

List of stool extraction controls with ≥10,000 sequences

[1] "tLIMS12651p1s90WCSctrl"            "tLIMS14462l1s277I077toI096WCSctrl"
[3] "tLIMS14462l1s366I058toI076WCSctrl" "tLIMS14462l1s90I038toI056WCSctrl" 
[5] "tLIMS18669s150WCSIctrl14"          "tLIMS18669s160WCSIctrl27"         
Read counts in extraction controls (WC) and negative-template PCR controls - breastmilk

Summary

  • All PCR-negative controls clear of significant amplification.
  • For stool extraction controls, 4 Indian extractions pools yielded ≥10,000 reads.
  • When individual extraction controls related to these pools were sequenced, two yielded substantial amplification (batches 14 and 27).
  • For breastmilk extraction controls, amplification seen in majority of samples - too many to identify cross-contaminated samples on a case by case basis.

Composition of stool extraction controls yielding amplification

Summary

  • Pools fall into two clusters: one associated with extraction batch 14, one with extraction batch 27 (containing 6 and 2 samples, respectively).
  • Therefore, it is likely that observed amplification in Indian extraction controls reflects cross-contamination from these two extraction batches.
  • 8 samples from these two extraction batches removed from the analysis.

Remove contaminated samples

Stool samples/controls * n samples: 4302
* n controls: 360
* n features: 9938

Nanodrop readings by country

Removal of contaminants from infant stools
  • Contaminants identified using frequency-based method in decontam package with default p value of 0.1.
  • Only samples sequenced in Liverpool included as these form basis for all primary analyses.
  • Identification of contaminants performed separately for each country (samples and extraction controls) and minimum p value taken (provided taxon observed at least 10 times).
  • Negative controls and extraction controls assigned arbitrary nanodrop reading of 0.001 ng/ul.
  • n samples: 2090
  • n controls: 54
  • n taxa: 4958
  • n contaminants: 8
Profile of most abundant contaminants

Removal of contaminants from maternal stools
  • n samples: 465
  • n controls: 54
  • n taxa: 4353
  • n contaminants: 4
Profile of most abundant contaminants

Removal of contaminants from breastmilk samples
  • n samples: 1311
  • n controls: 63
  • n taxa: 7328
  • n contaminants: 22
Profile of most abundant contaminants

Summary of filtered phyloseq object
  • n samples: 4302
  • n controls: 360
  • n features: 9938

Step 6: remove duplicates (including validation samples sequenced at London) - ps6

  • n samples: 3796
  • n controls: 340
  • n features: 9850

Select rarefaction depth

Lines display depths of 10,000, 25,000 and 50,000 sequences. Abbreviations: BM, breastmillk; BS, baby stool; MS, maternal stool.

      n d10k d25k d50k
BM 1325 1282 1206 1008
BS 2025 1979 1971 1927
MS  446  442  440  428

Rarefaction depth of 2.510^{4} sequences per sample retains 97.3% of infant samples, 98.7% of maternal samples, and 91% of breastmilk samples.

Step 7: remove samples with <25,000 sequences - ps7

  • n samples: 3617
  • n controls: 220
  • n features: 9841

Step 8: rarefy to 25,000 sequences - ps8

  • n samples: 3617
  • n controls: 220
  • n features: 9841

Filtering statistics

                        n_samples n_taxa total_count   min   mean     sd
ps1 (unfiltered)             4680  81278   599439742     2 128085 177663
ps2 (length)                 4680  69094   596953865     2 127554 177552
ps3 (taxonomy)               4679  61147   593399964     2 126822 177189
ps4 (≥0.1% in >1)            4673   9938   580590920     2 124244 174834
ps5 (decontam)               4662   9927   578415137     2 124070 174856
ps6 (no duplicates)          4136   9850   537792259     2 130027 183808
ps7 (samples with ≥25k)      3837   9841   535756229 25051 139629 187451
ps8 (rarefied to 25k)        3837   9841    95925000 25000  25000      0

Statistics by sample type

                     nsamples ntaxa total_count min     av     sd
ps1 (infant stool)       1856 10568   269774889   4 145353 142163
ps1 (maternal stool)      466 21698    62211349  78 133501 107906
ps1 (breastmilk)         1334 50579   146574829  13 109876 147375

Boxplots of filtering statistics by sample type

Filtering retention by sample type - ps5 (decontam)

Mean retention % after taxon filtering (ps5):
* Infant stools = 99.9
* Maternal stools = 96.1
* Breastmilk = 93.3

Statistics by sample type - filtered and deduplicated

                     nsamples ntaxa total_count   min     av     sd
ps7 (infant stool)       1698  4606   250812155 25070 147710 145590
ps7 (maternal stool)      440  4297    57114185 29464 129805 103313
ps7 (breastmilk)         1206  7236   130788689 25051 108448 141470

Statistics by sample type - all faecal samples

                nsamples ntaxa total_count   min     av     sd
ps7 (all stool)     2138  6282   307926340 25070 144025 138121

Statistics by sample type - rarefied

                     nsamples ntaxa total_count min_count av_count NA
ps8 (infant stool)       1698  4421    42450000     25000    25000  0
ps8 (maternal stool)      440  4236    11000000     25000    25000  0
ps8 (breastmilk)         1206  7214    30150000     25000    25000  0

Summary of filtered taxa

Ten most frequent taxanomic assignments displayed for each group. Remaining taxa grouped as ‘other’. Bar heights represent proportion of RSVs assigned to taxon, independent of their relative abundance.

Infant stool

Maternal stool

Breastmilk

Run-to-run variation

Column

Stool

Alpha and beta diversity in positive controls

Variation explained by run for each sample type

Sample subsets in which adonis p value <0.05 for either weighted or unweighted analyses

   country sample_type  R2_w   p_w  R2_u   p_u n_runs
2    India      week 4 0.034 0.026 0.023 0.301      7
5    India      mother 0.046 0.001 0.035 0.027      7
15  Malawi      mother 0.043 0.155 0.061 0.036      4

Breastmilk

Alpha and beta diversity in positive controls

Variation explained by run for each sample type

Sample subsets in which permanova p value <0.05 for either weighted or unweighted analyses

  country sample_type  R2_w   p_w  R2_u   p_u n_runs
1   India      week 1 0.014 0.199 0.029 0.001      4
2   India    week 7/9 0.035 0.001 0.036 0.001      4
3   India  week 11/13 0.031 0.001 0.038 0.001      4
8  Malawi    week 7/9 0.029 0.142 0.054 0.001      3
9  Malawi  week 11/13 0.032 0.143 0.040 0.006      3

10% validation

Column

Stool

Input data

ps5_validation
  • biom table containing paired Liverpool/London stool samples in which both have at least 25,000 reads
  • nsamples = 486
  • ntaxa = 356

Alpha diversity (rarefied to 25,000 sequences per samples)

Beta diversity plots

Correlation of genus abundances for top-20 genera

Breastmilk

Input data

ps5_validation
  • biom table containing paired Liverpool/London breastmilk samples in which both have at least 25,000 reads
  • nsamples = 158
  • ntaxa = 537

See outputs of analysis module 1 for further details of feature table filtering process.

Alpha diversity (rarefied to 25,000 sequences per samples)

Beta diversity plots

Correlation of genus abundances for top-20 genera

Unsupervised clustering

Clustering of samples (columns) and taxa (rows) - presence/absence at genus level

Number of taxa present at ≥0.1% abundance in ≥1% of samples: 157

Sample availability

Heatmap of sample availability for primary analyses

Sample availability by country

        BB1_IgA BB2_IgA MB1_IgA BM1_IgA BS3_A1AT BS3_MPO BB1_A1AG BS5_A1AT
India       305     305     305     301      301     301      304      298
India_e     166     166     166     163      163     163      165      162
India_u     138     138     138     137      136     136      138      134
Malawi      103     103     103      78       83      83       87       63
UK           56      54      49      30       48      47        0       51
        BS5_MPO MS1 BS1 BS2 BS3 BS5   n
India       298 288 289 287 289 282 307
India_e     162 156 155 156 154 160 166
India_u     134 129 131 128 132 120 138
Malawi       63  83  81  75  68  61 119
UK           51  57  59  59  53  55  60

Sample availability by seroconversion status

           BB1_IgA BB2_IgA MB1_IgA BM1_IgA BS3_A1AT BS3_MPO BB1_A1AG BS5_A1AT
India_r         85      85      85      83       83      83       84       83
India_nr       220     220     220     218      217     217      220      214
India_r_e       51      51      51      50       50      50       50       50
India_nr_e     115     115     115     113      113     113      115      112
India_r_u       34      34      34      33       33      33       34       33
India_nr_u     104     104     104     104      103     103      104      101
Malawi_r        24      24      24      16       16      16       19       12
Malawi_nr       79      79      79      62       62      62       68       41
UK_r            27      27      23      13       23      22        0       24
UK_nr           24      24      19      14       20      20        0       22
           BS5_MPO MS1 BS1 BS2 BS3 BS5   n
India_r         83  77  79  76  79  79  85
India_nr       214 209 208 209 208 202 220
India_r_e       50  46  47  46  46  49  51
India_nr_e     112 110 108 110 108 111 115
India_r_u       33  31  32  30  33  30  34
India_nr_u     101  98  99  98  99  90 104
Malawi_r        12  16  18  15  12   9  24
Malawi_nr       41  58  55  56  54  44  79
UK_r            24  25  27  26  25  25  27
UK_nr           20  23  23  24  23  24  24

Sample availability by dose 1 shedding status

           BB1_IgA BB2_IgA MB1_IgA BM1_IgA BS3_A1AT BS3_MPO BB1_A1AG BS5_A1AT
India_r         81      81      81      81       80      80       81       81
India_nr       222     222     222     218      219     219      221      215
India_r_e       29      29      29      29       28      28       29       29
India_nr_e     135     135     135     132      133     133      134      131
India_r_u       52      52      52      52       51      51       52       51
India_nr_u      86      86      86      85       85      85       86       83
Malawi_r        45      45      45      36       34      34       41       32
Malawi_nr       40      40      40      28       37      37       34       25
UK_r            52      49      46      26       44      44        0       48
UK_nr            4       5       3       4        4       3        0        3
           BS5_MPO MS1 BS1 BS2 BS3 BS5   n
India_r         81  73  73  77  76  74  82
India_nr       215 213 214 208 211 206 223
India_r_e       29  26  26  28  26  26  29
India_nr_e     131 128 127 126 126 132 135
India_r_u       51  46  46  48  49  47  52
India_nr_u      83  83  85  80  83  73  86
Malawi_r        32  42  39  32  25  33  56
Malawi_nr       25  27  29  32  32  21  45
UK_r            48  53  55  54  49  51  55
UK_nr            3   4   4   5   4   4   5

Sample availability by shedding at either dose

           BB1_IgA BB2_IgA MB1_IgA BM1_IgA BS3_A1AT BS3_MPO BB1_A1AG BS5_A1AT
India_r        150     150     150     149      147     147      150      147
India_nr       152     152     152     149      151     151      151      148
India_r_e       65      65      65      65       63      63       65       62
India_nr_e      99      99      99      96       98      98       98       98
India_r_u       84      84      84      83       82      82       84       83
India_nr_u      53      53      53      53       53      53       53       50
Malawi_r        41      41      41      29       35      35       36       28
Malawi_nr       20      20      20      13       19      19       19       16
UK_r            50      49      46      26       44      44        0       48
UK_nr            2       2       0       1        2       1        0        1
           BS5_MPO MS1 BS1 BS2 BS3 BS5   n
India_r        147 140 139 142 142 135 151
India_nr       148 145 147 142 144 144 153
India_r_e       62  61  60  62  60  61  65
India_nr_e      98  93  93  92  92  97  99
India_r_u       83  77  77  78  80  72  84
India_nr_u      50  51  53  49  51  47  53
Malawi_r        28  34  31  30  27  30  50
Malawi_nr       16  14  16  15  16  12  22
UK_r            48  53  53  54  47  51  54
UK_nr            1   1   2   2   2   2   2

Sample availability by any response status

           BB1_IgA BB2_IgA MB1_IgA BM1_IgA BS3_A1AT BS3_MPO BB1_A1AG BS5_A1AT
India_r        181     181     181     179      176     176      180      175
India_nr       121     121     121     119      121     121      121      119
India_r_e       87      87      87      86       84      84       86       83
India_nr_e      77      77      77      75       77      77       77       77
India_r_u       93      93      93      92       91      91       93       91
India_nr_u      44      44      44      44       44      44       44       42
Malawi_r        43      43      43      29       33      33       38       25
Malawi_nr       18      18      18      13       17      17       17       12
UK_r            46      46      40      23       40      40        0       43
UK_nr            2       2       0       1        2       1        0        1
           BS5_MPO MS1 BS1 BS2 BS3 BS5   n
India_r        175 167 168 167 169 164 181
India_nr       119 116 116 115 115 114 121
India_r_e       83  81  81  82  79  83  87
India_nr_e      77  73  72  72  73  75  77
India_r_u       91  85  86  84  89  80  93
India_nr_u      42  43  44  43  42  39  44
Malawi_r        25  30  27  29  27  25  43
Malawi_nr       12  12  14  13  15  11  18
UK_r            41  45  45  46  43  45  46
UK_nr            1   1   2   2   2   2   2

Sample availability by post-ORV IgA status

        BB1_IgA BB2_IgA MB1_IgA BM1_IgA BS3_A1AT BS3_MPO BB1_A1AG BS5_A1AT
India       305     305     305     301      300     300      304      297
India_e     166     166     166     163      163     163      165      162
India_u     138     138     138     137      136     136      138      134
Malawi      103     103     103      78       78      78       87       53
UK           51      54      45      29       43      42        0       48
        BS5_MPO MS1 BS1 BS2 BS3 BS5   n
India       297 286 287 285 287 281 305
India_e     162 156 155 156 154 160 166
India_u     134 129 131 128 132 120 138
Malawi       53  74  73  71  66  53 103
UK           46  51  53  53  48  51  54